A DHT-based Backup System
نویسندگان
چکیده
Distributed hash tables have been proposed as a way to simplify the construction of large-scale distributed applications (e.g. [1, 6]). DHTs are completely decentralized systems that provide block storage on a changing collection of nodes spread throughout the Internet. Each block is identified by a unique key. DHTs spread the load of storing and serving blocks across all of the active nodes and keep the blocks available as nodes join and leave the system. This paper presents the design and implementation of a cooperative off-site backup system, Venti-DHash. VentiDHash is based on a DHT infrastructure and is designed to support recovery of data after a disaster by keeping regular snapshots of file systems distributed off-site, on peers on the Internet. Whereas conventional backup systems incur significant equipment costs, manual effort and high administrative overhead, we hope that a distributed backup system can alleviate these problems, making backups easy and feasible. By building this system on top of a DHT, the backup application inherits the properties of the DHT, and serves to evaluate the feasibility of using a DHT to build large scale applications. The backup system is based around the Venti archival storage system [9], replacing the storage back-end with the DHash distributed hash table [5]. Venti-DHash operates as an archiver that takes complete file system snapshots, at a block level. Each unique block is only stored once, even across snapshots. DHash is used to balance storage and network load, as well as to provide adequate availability blocks. A number of changes were made the internals of DHash in order to meet our desired performance and availability goals. Our improved version of DHash is a DHT with good read and write performance, and 5 nines of availability per block (assuming an average node reliability of 90%). The resulting system is now being tested by running backups of our primary file server. The rest of the paper is structured as follows. Section 2 briefly surveys related work. The design of the backup system is presented in Section 3. Next, we describe how DHash was changed to achieve the desired performance and availability goals in Section 4. Section 5 describes some preliminary performance benchmarks and analysis we have conducted on our prototype. Finally, we conclude in Section 6.
منابع مشابه
EFFECT OF FIVE ALPHA DIHYDROTESTOSTERONE (5α-DHT) ON CYTOKINE PRODUCTION BY PERITONEAL MACROPHAGES OF NZB/BALBc MICE
One of the mechanisms involved in the regulation of the immune system by steroid hormones could be the monocytic-macrophage system. In this study the effect of the male hormone 5a-DHT on cytokine release by peritoneal macrophages (mΦ) of male and female NZB/BALBc mice was investigated. Macrophages from male mice activated with LPS produced a greater amount of IL-1β (21.8%) (p<0.05) and IL-...
متن کاملFlexible replica placement for optimized P2P backup on heterogeneous, unreliable machines
P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization’s workstations should be cheaper as existing hardware is used; more efficient as there is no single bottleneck server; and more reliable as the machines can be geographically dispersed. We present an architecture of a p2p backu...
متن کاملBuilding a reliable and high-performance content-based publish/subscribe system
Provisioning reliability in a high-performance content-based publish/subscribe system is a challenging problem. The inherent complexity of content-based routing makes message loss detection and recovery, and network state recovery extremely complicated. Existing proposals either try to reduce the complexity of handling failures in a traditional network architecture, which only partially address...
متن کاملSelf Chord-Achieving Load Balancing In Peer To Peer Network
The Cloud computing technology has been widely applied in e-business, e-education. Cloud computing platform is a set of Scalable large-scale data server clusters, it provides computing and storage services to customers. The cloud storage is a relatively basic and widely applied service which can provide users with stable, massive data storage space. Our research shows that the architecture of c...
متن کاملReplica placement for p2p redundant data storage on unreliable, non-dedicated machines
P2P architecture appears to fit for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization’s workstations should be cheaper (as existing hardware is used) and more efficient (as there is no single bottleneck server). However, non-dedicated machines cause other challenges. Update propagation algorithms must take into acco...
متن کامل